# UK Patent Application (19) GB (11) 2 062 310 A

- (21) Application No 8031666
- (22) Date of filing 1 Oct 1980
- (30) Priority data
- (31) 54/125316
- (32) 1 Oct 1979
- (33) Japan (JP)
- (43) Application published 20 May 1981
- (51) INT CL<sup>3</sup> G06F 7/52
- (52) Domestic classification G4A 18F 286 2BY MD
- (56) Documents cited GB 1497863 GB 1496935 GB 1216559 GB 1195410 GB 1076186

US 3752971A

- (58) Field of search G4A
- (71) Applicant
  Tokyo Shibaura Denki
  Kabushiki Kaisha,
  72 Horikawa-cho, Saiwaiku, Kawasaki-shi, Japan

- (72) Inventors Masahide Ohhashi, Hisao Yanagi
- (74) Agent Marks & Clerk, 57—60 Lincoln's Inn Fields, London WC2A 3LS

### (54) Binary Multiplication Cell Circuit

(57) A binary multiplier includes a staggered array of multiplier cells each including a NOR circuit (1) for obtaining a partial product of one binary digit of a multiplicand x<sub>1</sub> and

one binary digit, of a multiplier y, and a full adder arranged to produce from a) the partial product, b) an augend digit from the preceding row of the array, and c) a carry digit from the preceding cell in the same row, new carry and augend (or product) digits. The full adder comprises two AND circuits (2, 8), three NOR circuits (3, 4, 9), an inverter (7) and an exclusive OR circuit (5, 6) arranged as shown. Preferably, the exclusive OR circuit is constituted by an exclusive NOR circuit (5) and an inverter (6). The arrangement reduces the number of transistors required for each cell and thus facilitates production as an integrated circuit.

The drawings originally filed were informal and the print here reproduced is taken from a later filed formal copy.





FIG. 2 
$$c_i = \begin{bmatrix} x_i & y_j & c_{i-1} \\ x_i & y_j & y_j \end{bmatrix}$$

FIG. 3



FIG. 4



#### **SPECIFICATION Binary Multiplication Cell Circuit**

This invention relates to a binary multiplication cell circuit which may easily be formed of MOS

Computers achieve multiplication of binary numbers, as well as addition and subtraction of binary numbers. Binary multiplication is carried out usually in such way as shown in Fig. 1.

That is, a multidigit multiplicand X4 X3 X2 X1 is 10 mutliplied successively by the individual digits of a multidigit multiplier Y4 Y3 Y2 Y1. Blocks a shown in Fig. 1 correspond to individual multiplication cell circuits. Arrows extending downward represent addition, and arrows extending to the left indicate carry.

Fig. 2 illustrates input and output states of the multiplication cell circuit a. i and j are each anyone of integral numbers. Supplied to the cell circuit a are a multiplicand digit xi, a multiplier digit Yj, an augend Bi-1 obtained by a cell circuit immediately above the cell circuit a and a carry digit Ci-1 obtained by a cell circuit immediately right to the cell circuit a. The cell circuit a conducts the following arithmetic operations:

$$Ai=Xi \cdot Yj$$
 (1)

$$Ci=Ai \cdot Bi-1+Bi-1 \cdot Ci-1+Ci-1 \cdot Ai$$
 (3)

Thus each multiplication cell circuit a produces 30 a partial product Ai, an augend Bi to be supplied to a cell circuit immediately below the cell circuit a and a carry digit Ci to be supplied to a cell circuit immediately left to the cell circuit a.

Arithmetic operations (1), (2) and (3) are 35 carried out by logic circuits. The logic circuits may 100 be constituted by two half-adders and one logic gate for producing a partial product Ai. If typical half-adders are used, the cell circuit a must be provided with more gates, namely more MOS transistors. In case the multiplication circuit is fabricated in the form of an LSI circuit, it is desired that each of its cell circuits should be comprised of as less gates as possible, namely as less MOS transistors as possible.

An object of this invention is to provide a binary multiplication cell circuit which has a decreased number of elements and which is therefore suitable for an integrated circuit version.

A binary multiplication cell circuit of this 50 invention comprises a first NOR circuit for providing a partial product of one binary digit of a multiplicand and one binary digit of a multiplier, a first AND circuit connected to receive an output signal of the first NOR circuit and an augend, a 55 second NOR circuit connected to receive the augend and an output signal of the first NOR circuit, a third NOR circuit connected to receive an output signal of the first AND circuit and an output signal of the second NOR circuit, an

AND circuit connected to receive an output signal of the inverter and an output signal of the third NOR circuit, a fourth NOR circuit connected to receive an output signal of the second AND circuit 65 and the output signal of the second NOR circuit, and an exclusive OR circuit connected to receive the carry binary digit and the output signal of the third NOR circuit.

Preferably, the exclusive OR circuit is 70 constituted by an exclusive NOR circuit and an inverter.

This invention can be more fully understood from the following detailed description when taken in conjunction with the accompanying 75 drawings, in which:

Fig. 1 illustrates how a binary multiplication is carried out;

Fig. 2 shows input and output states of a multiplication cell circuit;

Fig. 3 is a block diagram of a binary multiplication cell circuit of this invention; and

Fig. 4 shows a practical circuit arrangement of the binary multiplication cell circuit of Fig. 3 which is comprised of N-channel MOS transistors.

Now referring to Fig. 1, a binary multiplication cell circuit of this invention will be described. Xi designates a signal obtained by inverting each binary digit of a multiplicand, Yj a signal obtained by inverting each binary digit of a multiplier, Bi-1 90 an augend, Ci-1 a carry digit, Bi a result of multiplication (or augend to supply to the next binary multiplication cell circuit), and Ci a carry binary digit. The signal Xi is commonly supplied to the cell circuits XiY<sub>1</sub>, XiY<sub>2</sub>, XiY<sub>3</sub> and XiY<sub>4</sub> of Fig. 1. The signal Yj is commonly supplied to the cell circuits X<sub>1</sub>Yj, X<sub>2</sub>Yj, X<sub>3</sub>Yj and X<sub>4</sub>Yj of Fig. 1.

The signals Xi and Yj are supplied also to a NOR circuit 1, which produces a partial product Ai (=Xi · Yj) which is represented by equation (1). The partial product Ai and augend Bi-1 are applied to an AND circuit 2 and a NOR circuit 3. The AND circuit 2 produces Ai · Bi-1, and the NOR circuit 3 produces Ai+Bi-1. These Ai · Bi-1 and Ai+Bi-1 are applied to a NOR 105 circuit 4, which provides Ai⊙BI-1. Ai⊙Bi-1 and a carry digit Ci-1 from the preceding binary multiplication cell circuit (not shown) are supplied to an exclusive NOR circuit 5, which forms Ai⊙Bi-1⊙Ci-1. Ai⊙Bi-1⊙Ci-1 is inverted by an inverter 6 to produce Bi

110 (=Ai⊕Bi-1⊕Ci-1) represented in equation (2), i.e. an augend to be added to the next cell circuit or one binary digit of the product of the two binary numbers.

115 The exclusive NOR circuit 5 and the inverter 5 may be constituted by an exclusive OR circuit.

The carry digit Ci-1 from the preceding cell circuit is inverted by an inverter 7. The carry digit Ci-1 thus inverted and the output Ai⊕Bi-1 of the 120 NOR circuit 4 are applied to an AND circuit 8, which produces a signal indicative of (Ai⊙Bi-1) · Ci-1. This signal (Ai⊙Bi-1) · Ci-1 and the output signal Ai⊕Bi-1 of the NOR circuit 3 are supplied to a NOR circuit 9, thereby 60 inverter for inverting a carry binary digit, a second 125 producing a carry digit Ci to be added to the next

binary multiplication cell circuit.

Carry digit Ci may be expressed as follows:

$$Ci = \overrightarrow{Ai} \cdot \overrightarrow{Bi} - 1 + \overrightarrow{Ai} \cdot \overrightarrow{Bi} - 1 \cdot \overrightarrow{Ci} - 1 + \overrightarrow{Ai} \cdot \overrightarrow{Bi} - 1 \cdot \overrightarrow{Ci} - 1$$

$$= \overrightarrow{Bi} - 1 (\overrightarrow{Ai} + \overrightarrow{Ci} - 1) + \overrightarrow{Ai} (\overrightarrow{Bi} - 1 + \overrightarrow{Ci} - 1)$$

$$= \overrightarrow{Ai} \cdot \overrightarrow{Bi} - 1 + \overrightarrow{Ai} \cdot \overrightarrow{Ci} \cdot 1 + \overrightarrow{Bi} - 1 \cdot \overrightarrow{Ci} - 1$$

$$= (\overrightarrow{Ai} + \overrightarrow{Bi} - 1)(\overrightarrow{Ai} + \overrightarrow{Ci} - 1)(\overrightarrow{Bi} - 1 + \overrightarrow{Ci} - 1)$$

$$= \overrightarrow{Ai} \cdot \overrightarrow{Bi} - 1 + \overrightarrow{Bi} - 1 \cdot \overrightarrow{Ci} - 1 + \overrightarrow{Ci} - 1 \cdot \overrightarrow{Ai}$$

Fig. 4 is a circuit diagram of the binary multiplication cell circuit of Fig. 1, which is 10 formed of N-channel insulated gate field effect transistors. The cell circuit comprises depletion load transistors 11 to 17 and enhancement transistors 18 to 31. The NOR circuit 1 is comprised of the transistors 11. 18 and 19, the 15 NOR circuit 3 is comprised of the transistors 15, 26 and 27, the exclusive NOR circuit 5 is comprised of the transistors 13, 23 and 24, the inverter 6 is comprised of the transistors 14 and 25, and the inverter 7 is comprised of transistors 20 16 and 28. The AND circuit 2 and the NOR circuit 4 are constituted by the transistors 12, 20, 21 and 22. The AND circuit 8 and the NOR circuit 9 is comprised of the transistors 17, 29, 30 and 31.

The circuit of Fig. 3, except for the NOR circuit

1 which produces a partial product Ai, constitutes
a full adder. If a binary multiplication cell circuit is
to be comprised of one NOR circuit for producing
a partial product and two typical half-adders
constituting a full adder, approximately 30

transistors will be required. By contrast, the binary

multiplication cell circuit shown in Fig. 4 requires but 21 transistors.

In the above-mentioned embodiment of this invention the power dissipation can be reduced without using more transistors, only if the inverters 6 and 7 are replaced each by a two-transistor complementary inverter.

#### Claims

1. A binary multiplication cell circuit 40 comprising:

a first NOR circuit for providing a partial product of one binary digit of a multiplicand and one binary digit of a multiplier;

a first AND circuit connected to receive an 45 output signal of said first NOR circuit and an augend:

a second NOR circuit connected to receive said augend and the output signal of said first NOR circuit:

a third NOR circuit connected to receive an output signal of said first AND circuit and an output signal of said second NOR circuit;

an inverter for inverting a carry binary digit;
a second AND circuit connected to receive an

55 output signal of said first inverter and an output signal of said third NOR circuit;

a fourth NOR circuit connected to receive an output signal of said second AND circuit and the output signal of said second NOR circuit; and

an exclusive OR circuit connected to receive the carry binary digit and the output signal of said third NOR circuit.

- A binary multiplication cell circuit according to claim 1, wherein said exclusive OR circuit is constituted by an exclusive NOR circuit and an inverter.
- A binary multiplication cell circuit, substantially as hereinbefore described with reference to the accompanying drawings.

Printed for Her Majesty's Stationery Office by the Courier Press, Learnington Sps, 1981. Published by the Patent Office. 25 Southampton Buildings, London, WC2A 1AY, from which copies may be obtained.

50

60

## (12) UK Patent Application (19) GB (11) 2 365 637 (13) A

(43) Date of A Publication 20.02.2002

(21) Application No 0101961.1

(22) Date of Filing 25.01.2001

(30) Priority Data (31) 0019287

(32) 04.08.2000

(33) GB

(71) Applicant(s)

Automatic Parallel Designs Limited (Incorporated in the United Kingdom) 7200 The Quorum, Alec Issigonis Way, Oxford Business Park North, OXFORD, OX4 2JZ, United Kingdom

(72) Inventor(s)

Dmitrly Rumynin

Sunil Talwar

Peter Meulemans

(74) Agent and/or Address for Service

Marks & Clerk

57-60 Lincoln's Inn Fields, LONDON, WC2A 3LS,
United Kingdom

(51) INT CL<sup>7</sup> G06F 7/60

(52) UK CL (Edition T )
H3P PHX
G4A AGB AMD
H3T T2B3

(58) Field of Search
UK CL (Edition S ) G4A AGB AMD , H3P PHX
INT CL<sup>7</sup> G06F 7/50 7/52 7/60 , H03K 19/20 19/23
ONLINE: ELSEVIER, EPODOC, IEL, INSPEC, JAPIO,

(54) Abstract Title
Parallel counter and multiplication logic circuit

(57) A parallel counter comprises a plurality of binary inputs and a plurality of binary outputs. The binary code outputted to the outputs indicates the number of binary ones in the plurality of inputs. A logic circuit is connected between the inputs and outputs and generates each of the binary outputs as a symmetric function of the binary inputs. The logic circuit is divided into a plurality of logic units, wherein each unit is arranged to generate logic outputs as a symmetric function of the inputs to that unit. A logic circuit for multiplying two N bit binary numbers is also disclosed, comprising array generation logic for performing an AND operation between respective bits of the two numbers to form an array of binary values, and for further logically combining values to generate an array of a maximum depth below N bits. Array reduction logic is used to reduce the depth of the array to two binary numbers and addition logic is used for adding the binary values of the two binary numbers. The array reduction logic may include at least one of: at least one full adder, at least one half adder and at least one parallel counter. A method of designing a logic circuit is also claimed, including providing a library of logic module designs, each for performing a small symmetric function, and designing a logic circuit to perform a large symmetric function.

### PRIOR ART



Fig 1

## PRIOR ART



Fig 2



Fig 3

OR\_3\_1



Fig 4

## OR\_4\_1



Fig 5

OR\_5\_1



Fig 6

7/26

EXOR\_7\_1



Fig 7



Fig 8



Fig 9



Fig 10



Fig 11



Fig 12



Fig 13



Fig 14



Fig 15



Fig 16



Fig 17



Fig 18



Fig 19



Fig 20



Fig 21

### **PRIOR ART**



Fig 22

#### 23/26

### **PRIOR ART**



Fig 23



Fig 24

 $A_7B_0 \ A_6B_0 \ A_5B_0 \ A_4B_0 \ A_3B_0 \ A_2B_0 \ A_1B_0 \ A_0B_0$ 

 $Z=A_7B_1\ A_6B_1\ A_5B_1\ A_4B_1\ A_3B_1\ A_2B_1\ A_1B_1\ A_0B_1$ 

```
A7B2 A6B2 A3B2 A3B2 A3B2 A2B2 A1B2 A0B2
A7B3 A6B3 A3B3 A3B3 A3B3 A2B3 A1B3 A0B3
A7B4 A6B4 A3B4 A3B4 A3B4 A2B4 A1B4 A0B4
A7B5 A6B5 A3B5 A4B5 A3B5 A2B5 A1B5 A0B5
A7B6 A6B6 A5B6 A4B6 A3B6 A2B6 X A0B6
A7B7 A6B7 A3B7 A4B7 A3B7 A2B7 Y

using 1 HA. 3 FA-s, and 8 higher parallel counters

using 9 FA-s and 3 HA-s
```

Fig 25



Fig 26

#### A PARALLEL COUNTER AND A MULTIPLICATION LOGIC CIRCUIT

The present invention generally relates to digital electronic devices and more particularly to a digital electronic device performing binary logic. In one aspect the present invention relates to a parallel counter and in another aspect the present invention relates to a multiplication logic circuit for multiplying two binary numbers.

It is instrumental for many applications to have a block that adds n inputs together. An output of this block is a binary representation of the number of high inputs. Such blocks, called parallel counters (L. Dadda, Some Schemes for Parallel Multipliers, Alta Freq 34: 349-356 (1965); E. E. Swartzlander Jr., Parallel Counters, IEEE Trans. Comput. C-22: 1021-1024 (1973)), are used in circuits performing binary multiplication. There are other applications of a parallel counter, for instance, majority-voting decoders or RSA encoders and decoders. It is important to have an implementation of a parallel counter that achieves a maximal speed. It is known to use parallel counters in multiplication (L. Dadda, On Parallel Digital Multipliers, Alta Freq 45: 574-580 (1976)).

A full adder is a special parallel counter with a three-bit input and a two-bit output. A current implementation of higher parallel counters i.e. with a bigger number of inputs is based on using full adders (C. C. Foster and F. D. Stockton, Counting Responders in an Associative Memory, IEEE Trans. Comput. C-20: 1580-1583 (1971)). In general, the least significant bit of an output is the fastest bit to produce in such implementation while other bits are usually slower.

The following notation is used for logical operations:

⊕ - Exclusive OR;

∨ - OR;

 $\wedge$  - AND;

¬ - NOT.

An efficient prior art design (Foster and Stockton) of a parallel counter uses full adders. A full adder, denoted FA, is a three-bit input parallel counter shown in figure 1. It has three inputs  $X_1$ ,  $X_2$ ,  $X_3$ , and two outputs S and C. Logical expressions for outputs are

$$S = X_1 \oplus X_2 \oplus X_3,$$

$$C = (X_1 \wedge X_2) \vee (X_1 \wedge X_3) \vee (X_2 \wedge X_3).$$

A half adder, denoted HA, is a two bit input parallel counter shown in figure 1. It has two inputs  $X_1$ ,  $X_2$  and two outputs S and C. Logical expressions for outputs are

$$S = X_1 \oplus X_2$$

$$C = X_1 \wedge X_2$$

A prior art implementation of a seven-bit input parallel counter illustrated in figure 2.

Multiplication is a fundamental operation. Given two n-digit binary numbers

$$A_{n-1}2^{n-1}+A_{n-2}2^{n-2}+...+A_12+A_0$$
 and  $B_{n-1}2^{n-1}+B_{n-2}2^{n-2}+...+B_12+B_0$ ,

their product

$$P_{2n-1}2^{2n-1} + P_{2n-2}2^{2n-2} + \ldots + P_12 + P_0$$

may have up to 2n digits. Logical circuits generating all P<sub>i</sub> as outputs generally follow the scheme in figure 14. Wallace has invented the first fast architecture for a multiplier, now called the Wallace-tree multiplier (Wallace, C. S., A Suggestion for a Fast Multiplier, IEEE Trans. Electron. Comput. EC-13: 14-17 (1964)). Dadda has investigated bit behaviour in a multiplier (L. Dadda, Some Schemes for Parallel Multipliers, Alta Freq 34: 349-356 (1965)). He has constructed a variety of multipliers and most multipliers follow Dadda's scheme.

Dadda's multiplier uses the scheme in on figure 22. If inputs have 8 bits then 64 parallel AND gates generate an array shown in figure 23. The AND gate sign  $\wedge$  is omitted for clarity so that  $A_i \wedge B_j$  becomes  $A_i B_j$ . The rest of figure 23 illustrates array reduction that involves full adders (FA) and half adders (HA). Bits from the same column are added by half adders or full adders. Some groups of bits fed into a full adder are in rectangles. Some groups of bits fed into a half adder are in ovals. The result of array reduction is

just two binary numbers to be added at the last step. One adds these two numbers by one of the fast addition schemes, for instance, conditional adder or carry-look-ahead adder.

In accordance with the first aspect the present invention provides a parallel counter based on algebraic properties of symmetric functions. Each of the plurality of binary output bits is generated as a symmetric function of a plurality of binary input bits.

The symmetric functions comprise logically AND combining sets of one or more binary inputs and logically OR or exclusive OR logic combining the logically combined sets of binary inputs to generate a binary output. The OR and the exclusive OR symmetric functions are elementary symmetric functions and the generated output binary bit depends only on the number of high inputs among the input binary bits. For the OR symmetric function, if the number of high inputs is m, the output is high if and only if m ≥k, where k is the size of the sets of binary inputs. Similarly, the generated output binary bit using the exclusive OR symmetric function is high if and only if m ≥k and the number of subsets of inputs of the set of high inputs is an odd number. In one embodiment the size of the sets can be selected. The i<sup>th</sup> output bit can be generated using the symmetric function using exclusive OR logic by selecting the set sizes to be of size 2<sup>i</sup>, where i is an integer from 1 to N, N is the number of binary outputs. and i represents the significance of each binary output.

In one embodiment the sets of binary inputs used in the symmetric functions are each unique and they cover all possible combinations of binary inputs.

This embodiment reduces the amount of fan-out in the circuit and increases the amount of logic sharing. It thus makes parallel counters for a large binary number more practicable.

In another embodiment of the present invention, the logic and inputs of the parallel counter are divided in accordance with a binary tree. The logic circuit is divided into a plurality of logic units. Each logic unit is arranged to generate logic unit binary outputs

as a symmetric function of the binary inputs to the logic unit. The binary inputs are divided into inputs into the plurality of logic units, and the binary outputs of the plurality of outputs are generated using binary outputs of a plurality of the logic units.

In a preferred embodiment, each of the logic units is arranged to receive 2<sup>n</sup> of the binary inputs, where n is an integer indicating the level of the logic units in the binary tree, the logic circuit has m logic units at each level, where m is a rounded up integer determined from (the number of binary inputs)/2<sup>n</sup>, logic units having a higher level in the binary tree comprise logic of logic units at lower levels in the binary tree, and each logic unit is arranged to generate logic unit binary outputs as a symmetric function of the binary inputs to the logic unit.

In one embodiment, each logic unit at the first level is arranged to generate logic unit binary outputs as a smallest elementary symmetric function of the binary inputs to said logic circuit.

In one embodiment, each logic unit at the first level is arranged to generate logic unit binary outputs as a symmetric function of the binary inputs to the logic circuit using OR logic for combining the binary inputs.

In one embodiment, each logic unit at the first level is arranged to logically AND each of the binary inputs to the logic unit and to logically OR each of the binary inputs to the logic unit to generate the logic unit binary outputs.

In one embodiment, each logic unit at the first level is arranged to generate logic unit binary outputs as a symmetric function of the binary inputs to the logic circuit using exclusive OR logic for combining the binary inputs.

In one embodiment, each logic unit at the first level is arranged to logically AND each of the binary inputs to the logic unit and to logically exclusively OR each of the binary inputs to the logic unit to generate the logic unit binary outputs.

In one embodiment, elementary logic units are provided as the logic units at the first level for performing elementary symmetric functions, outputs from each of two primary elementary logic units receiving four logically adjacent binary inputs from said plurality of inputs are input to two secondary elementary logic units, an output from each of the secondary elementary logic units is input to a tertiary elementary logic unit, and the primary, secondary and tertiary elementary logic units form a secondary logic unit at a second level of the binary tree having a binary output comprising a binary output from each of the secondary elementary logic units and two binary outputs from the tertiary elementary logic unit.

In one embodiment, tertiary logic units at a third level of the binary tree each comprise two secondary logic units receiving eight logically adjacent binary inputs from the plurality of inputs, four elementary logic units receiving as inputs the outputs of the two secondary logic units, and further logic for generating binary outputs as a symmetric function of the binary inputs to the tertiary logic unit using the binary outputs of the four elementary logic units.

In one embodiment, quaternary logic units at a fourth level of the binary tree each comprise two tertiary logic units receiving sixteen logically adjacent binary inputs from the plurality of inputs, four elementary logic units receiving as inputs the outputs of the two tertiary logic units, and further logic for generating binary outputs as a symmetric function of the binary inputs to the quaternary logic unit using the binary outputs of the four elementary logic units.

In one embodiment, elementary logic units are provided as the logic units at the first level for performing the smallest elementary symmetric functions, and logic units for higher levels comprise logic units of lower levels.

In one embodiment, the logic units for higher levels above the second level comprise logic units of an immediately preceding level and elementary logic units.

In one embodiment, each logic unit at each level is arranged to generate logic unit binary outputs as a symmetric function of the binary inputs to the logic circuit using OR logic for combining the binary inputs.

In one embodiment, each logic unit at each level is arranged to generate logic unit binary outputs as a symmetric function of the binary inputs to the logic circuit using exclusive OR logic for combining the binary inputs.

In one embodiment of the present invention, each of the binary outputs can be generated using a symmetric function which uses exclusive OR logic. However, exclusive OR logic is not as fast as OR logic.

In accordance with another embodiment of the present invention at least one of the binary outputs is generated as a symmetric function of the binary inputs using OR logic for combining a variety of sets of one or more binary inputs. The logic is arranged to logically AND members of each set of binary inputs and logically OR the result of the AND operations.

Thus use of the symmetric function using OR logic is faster and can be used for generation of the most significant output bit. In such an embodiment the set size is set to be 2<sup>N-1</sup>, where N is the number of binary outputs and the N<sup>th</sup> binary output is the most significant.

It is also possible to use the symmetric function using OR logic for less significant bits on the basis of the output value of a more significant bit. In such a case, a plurality of possible binary outputs for a binary output less significant than the N<sup>th</sup> are generated as symmetric functions of the binary inputs using OR logic for combining a plurality of sets of one or more binary inputs, where N is the number of binary outputs. Selector logic is provided to select one of the possible binary outputs based on a more significant binary output value. The size of the sets used in such an arrangement for the (N-1)<sup>th</sup> bit

is preferably  $2^{N-1} + 2^{n-2}$  and  $2^{N-2}$  respectively and one of the possible binary outputs is selected based on the  $N^{th}$  binary output value.

In one embodiment of the present invention the circuit is designed in a modular form. A plurality of subcircuit logic modules are designed, each for generating intermediate binary outputs as a symmetric function of some of the binary inputs. Logic is also provided in this embodiment for logically combining the intermediate binary outputs to generate a binary outputs.

Since OR logic is faster, in a preferred embodiment the subcircuit logic modules implement the symmetric functions using OR logic. In one embodiment the subcircuit modules can be used for generating some binary outputs and one or more logic modules can be provided for generating other binary outputs in which each logic module generates a binary output as a symmetric function of the binary inputs exclusive OR logic for combining a plurality of sets of one or more binary inputs.

Another aspect of the present invention provides a method of designing a logic circuit comprising: providing a library of logic module designs each for performing a small symmetric function; designing a logic circuit to perform a large symmetric function; identifying small symmetric functions which can perform said symmetric function; selecting logic modules from said library to perform said small symmetric functions; identifying a logic circuit in the selected logic circuit which performs a symmetric function and which can be used to perform another symmetric function; selecting the logic circuit corresponding to the identified symmetric function and using the selected logic circuit with inverters to perform said other symmetric function using the relationship between the symmetric functions:

OR 
$$n k(X_1...X_n) = \neg OR n (n+1-k)(\neg X_1...\neg X_n)$$

where  $\neg$  denotes an inversion, n is the number of inputs, and k is the number of sets of inputs AND combined together.

Another aspect of the present invention provides a conditional parallel counter having m possible high inputs out of n inputs, where m<n, and n and m are integers, the counter comprising the parallel counter for counting inputs to generate p outputs for m inputs, wherein the number n of inputs to the counter is greater than 2<sup>p</sup>, where p is an integer.

Thus these aspects of the present invention provide a fast circuit that can be used in any architecture using parallel counters. The design is applicable to any type of technology from which the logic circuit is built.

The parallel counter in accordance with this aspect of the present invention is generally applicable and can be used in a multiplication circuit that is significantly faster than prior art implementations.

In accordance with another aspect of the present invention a technique for multiplying 2N bit binary numbers comprises an array generation step in which an array of logical combinations between the bits of the two binary numbers is generated which is of reduced size compared to the prior art.

In accordance with this aspect of the present invention, a logic circuit for multiplying 2N bit numbers comprises array generation logic for performing the logical AND operation between each bit in one binary bit and each bit in the other binary number to generate an array of logical AND combinations comprising an array of binary values, and for further logically combining logically adjacent values to reduce the maximum depth of the array to below N bits; array reduction logic for reducing the depth of the array to two binary numbers; and addition logic for adding the binary values of the two binary numbers.

When two binary numbers are multiplied together, as is conventional, each bit  $A_i$  of the first binary number is logically AND combined with each bit  $B_j$  of the second number to generate the array which comprises a sequence of binary numbers represented by the logical AND combinations,  $A_i$  AND  $B_j$ . The further logical combinations are carried

out by logically combining the combinations  $A_1$  AND  $B_{N-2}$ ,  $A_1$  AND  $B_{N-1}$ ,  $A_0$  AND  $B_N$ . 2, and  $A_0$  AND  $B_{N-1}$ , where N is the number of bits in the binary numbers. In this way the size of the maximal column of numbers to be added together in the array is reduced.

More specifically the array generation logic is arranged to combine the combinations  $A_1$  AND  $B_{n-2}$  and  $A_0$  AND  $B_{n-1}$  using exclusive OR logic to replace these combinations and to combine  $A_1$  AND  $B_{N-1}$  and  $A_0$  AND  $B_{n-2}$  to replace the  $A_1$  AND  $B_{n-1}$  combination.

In one embodiment of the present invention the array reduction logic can include at least one of: at least one full adder, at least one half adder, and at least one parallel counter. The or each parallel counter can comprise the parallel counter in accordance with the first aspects of the present invention.

This aspect of the present invention provides a reduction of the maximal column length in the array thereby reducing the number of steps required for array reduction. When the first aspect of the present invention is used in conjunction with the second aspect of the present invention, an even more efficient multiplication circuit is provided.

Embodiments of the present invention will now be described with reference to the accompanying drawings, in which:

Figure 1 is a schematic diagram of a full adder and a half adder in accordance with the prior art,

Figure 2 is a schematic diagram of a parallel counter using full adders in accordance with the prior art,

Figure 3 is a schematic diagram illustrating the logic modules executing the symmetric functions for the generation of binary outputs and the multiplexor (selector) used for selecting outputs,

Figure 4 is a diagram illustrating the logic for implementing the symmetric function OR\_3\_1 according to one embodiment of the present invention,

Figure 5 is a diagram illustrating the logic for implementing the symmetric function OR\_4\_1 according to one embodiment of the present invention,

Figure 6 is a diagram illustrating the logic for implementing the symmetric function OR\_5\_1 using 2 3 input OR gates according to one embodiment of the present invention,

Figure 7 is a diagram illustrating the logic for implementing the symmetric function EXOR\_7\_1 using two input exclusive OR gates according to one embodiment of the present invention,

Figure 8 is a diagram illustrating the logic for implementing the symmetric function OR\_3\_2 according to one embodiment of the present invention,

Figure 9 is a diagram illustrating the logic for implementing the symmetric function EXOR\_5\_3 according to one embodiment of the present invention,

Figure 10 is a diagram illustrating a parallel counter using the two types of symmetric functions and having seven inputs and three outputs according to one embodiment of the present invention,

Figure 11 is a diagram illustrating splitting of the symmetric function OR\_7\_2 into sub modules to allow the reusing of smaller logic blocks according to one embodiment of the present invention,

Figure 12 is a diagram of a parallel counter using the EXOR\_7\_1 symmetric function for the generation of the least significant output bit from all of the input bits, and smaller modules implementing symmetric functions using OR logic to generate the second and third output bits according to one embodiment of the present invention,

Figure 13 is a another diagram of a parallel counter similar to that of Figure 12 accept that the partitioning of the inputs is chosen differently to use different functional sub modules according to one embodiment of the present invention,

Figure 14 is a diagram schematically illustrating the binary tree organisation of the logic in a parallel counter according to a second aspect of the invention,

Figure 15 is a diagram illustrating the logic block (Block 1) for implementing the elementary symmetric functions OR\_2\_2 and OR\_2\_1 according to one embodiment of the present invention,

Figure 16 is a diagram illustrating the logic block (Block 2) for implementing the secondary symmetric functions OR\_4\_4, OR\_4\_3, OR\_4\_2 and OR\_4\_1 according to one embodiment of the present invention,

Figure 17 is a diagram illustrating the logic block (Block 3) for implementing the tertiary symmetric functions OR\_8\_8, OR\_8\_7, OR\_8\_6, OR\_8\_5, OR\_8\_4, OR\_8\_3, OR\_8\_2 and OR\_8\_1 according to one embodiment of the present invention, Figure 18 is a diagram illustrating the logic block (Block 4) for implementing the symmetric functions OR\_15\_12, OR\_15\_8 and OR\_15\_4 according to one embodiment of the present invention,

Figure 19 is a diagram illustrating the logic block (Block 5) for implementing the elementary symmetric functions EXOR\_4\_2 and OR\_4\_1 according to one embodiment of the present invention,

Figure 20 is a diagram illustrating the logic block (Block 6) for implementing the elementary symmetric functions EXOR\_15\_2 and OR\_15\_1 according to one embodiment of the present invention,

Figure 21 is a diagram schematically illustrating a parallel counter using the logic blocks of Figures 15 to 20 according to one embodiment of the present invention,
Figure 22 is a diagram of the steps used in the prior art for multiplication,
Figure 23 is a schematic diagram of the process of Figure 22 in more detail,
Figure 24 is a diagram illustrating the properties of diagonal regions in the array,
Figure 25 is a diagram illustrating array deformation in accordance with the embodiment

of the present invention and the subsequent steps of array reduction and adding, and

Figure 26 is a diagram of logic used in this embodiment for array generation.

The first aspect of the present invention will now be described.

The first aspect of the present invention relates to a parallel counter counting the number of high values in a binary number. The counter has i outputs and n inputs where i is determined as being the integer part of log<sub>2</sub> n plus 1

A mathematical basis for the first aspect of the present invention is a theory of symmetric functions. We denote by  $C^n_k$  the number of distinct k element subsets of a set of n elements. We consider two functions EXOR\_n\_k and OR\_n\_k of n variables  $X_1$ ,  $X_2$ , ...  $X_n$  given by

EXOR\_n\_k(X<sub>1</sub>, X<sub>2</sub>, ... X<sub>n</sub>) = 
$$\oplus$$
 (X<sub>i1</sub>  $\wedge$  X<sub>i2</sub>  $\wedge$  ...  $\wedge$ X<sub>ik</sub>),  
OR\_n\_k(X<sub>1</sub>, X<sub>2</sub>, ... X<sub>n</sub>) =  $\vee$  (X<sub>i1</sub>  $\wedge$  X<sub>i2</sub>  $\wedge$  ...  $\wedge$ X<sub>ik</sub>)

where (i1, i2, ... ik) runs over all possible subsets of  $\{X_1, X_2, ... X_n\}$  that contain precisely k elements. Blocks that produce such outputs are shown on figure 3.

The functions EXOR\_n\_k and OR\_n\_k are elementary symmetric functions. Their values depend only on the number of high inputs among  $X_1, X_2, X_3, \ldots X_n$ . More precisely, if m is the number of high inputs among  $X_1, X_2, X_3, \ldots X_n$  then  $OR_n_k(X_1, X_2, \ldots X_n)$  is high if and only if  $m \ge k$ . Similarly, EXOR\_n\_k(X\_1, X\_2, \ldots X\_n) is high if and only if  $m \ge k$  and  $C^m_k$  is odd.

Although EXOR\_n\_k and OR\_n\_k look similar, OR\_n\_k is much faster to produce since EXOR-gates are slower than OR-gates.

In the above representation n is the number of inputs and k is the size of the subset of inputs selected. Each set of k inputs is a unique set and the subsets comprise all possible subsets of the set of inputs. For example, the symmetric function  $OR_3_1$  has three inputs  $X_1$ ,  $X_2$  and  $X_3$  and the set size is 1. Thus the sets comprise  $X_1$ ,  $X_2$  and  $X_3$ . Each of these sets is then logically OR combined to generated the binary output. The logic for performing this function is illustrated in Figure 4.

Figure 5 illustrates the logic for performing the symmetric OR\_4\_1.

When the number of inputs become large, it may not be possible to use simple logic.

Figure 6 illustrates the use of two OR gates for implementing the symmetric function OR\_5\_1.

Figure 7 similarly illustrates the logic for performing EXOR\_7\_1. The sets comprise the inputs  $X_1$ ,  $X_2$ ,  $X_3$ ,  $X_4$ ,  $X_5$ ,  $X_6$ , and  $X_7$ . These inputs are input into three levels of exclusive OR gates.

When k is greater than 1, the inputs in a subset must be logically AND combined. Figure 8 illustrates logic for performing the symmetric function  $OR_3_2$ . The inputs  $X_1$  and  $X_2$  comprise the first set and are input to a first AND gate. The inputs  $X_1$  and  $X_3$  constitute a second set and are input to a second AND gate. The inputs  $X_2$  and  $X_3$  constitute a third set and are input to a third AND gate. The output of the AND gates are input to an OR gate to generate the output function.

Figure 9 is a diagram illustrating the logic for performing the symmetric function EXOR\_5\_3. To perform this function the subsets of size 3 for the set of five inputs comprise ten sets and ten AND gates are required. The output of the AND gates are input to an exclusive OR gate to generate the function.

The specific logic to implement the symmetric functions will be technology dependent. Thus the logic can be designed in accordance with the technology to be used.

In accordance with a first embodiment of the present invention the parallel counter of each output is generated using a symmetric function using exclusive OR logic.

Let the parallel counter have n inputs  $X_1, \ldots X_n$  and t+1 outputs  $S_i, S_{i-1}, \ldots S_0$ .  $S_0$  is the least significant bit and  $S_i$  is the most significant bit. For all i from 0 to t,

$$S_i = EXOR_n_2^i(X_1, X_2, ... X_n).$$

It can thus be seen that for a seven bit input i.e. n=7, i will have values of 0. 1 and 2. Thus to generate the output S<sub>0</sub> the function will be EXOR\_7\_1, to generate the output S<sub>1</sub> the function will be EXOR\_7\_2 and to generate the output S<sub>2</sub> the function will be EXOR\_7\_4. Thus for the least significant bit the set size (k) is 1, for the second bit the set size is 2 and for the most significant bit the set size is 4. Clearly the logic required for the more significant bits becomes more complex and thus slower to implement.

Thus in accordance with a second embodiment of the present invention, the most significant output bit is generated using a symmetric function using OR logic.

This is more practical since OR\_n\_k functions are faster than EXOR\_n\_k functions. For the most significant output bit

$$S_k = OR_n 2^t(X_1, X_2, ..., X_n)$$

In particular, with a seven-bit input

$$S_2=OR_7_4(X_1, X_2, X_3, X_4, X_5, X_6, X_7).$$

Thus in this second embodiment of the present invention the most significant bit is generated using symmetric functions using OR logic whereas the other bits are generated using symmetric functions which use exclusive OR logic.

A third embodiment will now be described in which intermediate bits are generated using symmetric functions using OR logic.

An arbitrary output bit can be expressed using OR\_n\_k functions if one knows bits that are more significant. For instance, the second most significant bit is given by

$$S_{t-1} = (S_t \wedge OR_n_2^{t} + 2^{t-1}) \vee ((\neg S_t) \wedge OR_n_2^{t-1}).$$

In particular, with a seven-bit input

$$S_1 = (S_2 \land OR_7 - 6(X_1, X_2, X_3, X_4, X_5, X_6, X_7)) \lor ((\neg S_2) \land OR_7 - 2(X_1, X_2, X_3, X_4, X_5, X_6, X_7)).$$

A further reduction is

$$S_1 = OR_7_6(X_1, X_2, X_3, X_4, X_5, X_6, X_7) \lor$$

$$((\neg S_2) \land OR_7_2(X_1, X_2, X_3, X_4, X_5, X_6, X_7)).$$

A multiplexer MU, shown in figure 3, implements this logic. It has two inputs  $X_0$ ,  $X_1$ , a control C, and an output Z determined by the formula

$$Z=(C\wedge X_1)\vee ((\neg C)\wedge X_0).$$

It is not practical to use either EXOR\_n\_k functions or OR\_n\_k functions exclusively. It is optimal to use OR\_n\_k functions for a few most significant bits and EXOR\_n\_k

functions for the remaining bits. The fastest, in TSMC.25, parallel counter with 7 inputs is shown in figure 10.

Future technologies that have fast OR\_15\_8 blocks would allow building a parallel counter with 15 inputs. A formula for the third significant bit using OR\_n\_m functions is thus:

$$\begin{split} S_{t-2} &= (S_t \wedge S_{t-1} \wedge OR\_n\_2^{t} + 2^{t-1} + 2^{t-2}) \vee (S_t \wedge (\neg S_{t-1}) \wedge OR\_n\_2^{t} + 2^{t-2}) \vee \\ & ((\neg S_t) \wedge S_{t-1} \wedge OR\_n\_2^{t-1} + 2^{t-2}) \vee ((\neg S_t) \wedge (\neg S_{t-1}) \wedge OR\_n\_2^{t-2}). \end{split}$$

A fourth embodiment of the present invention will now be described which divides the logic block implementing the symmetric function into small blocks which can be reused.

An implementation of  $OR_7_2$  is shown in figure 11. The 7 inputs are split into two groups: five inputs from  $X_1$  to  $X_5$  and two remaining inputs  $X_6$  and  $X_7$ . Then the following identity is a basis for the implementation in figure 11.

OR 7  $2(X_1,...,X_7) = OR_5_2(X_1,...,X_5) \vee$ 

$$(OR_5_1(X_1, ... X_5) \land OR_2_1(X_6, X_7)) \lor OR_2_2(X_6, X_7)$$
  
One can write similar formulas for  $OR_7_4$  and  $OR_7_6$ . Indeed,  
 $OR_7_4(X_1, ..., X_7) = OR_5_4(X_1, ..., X_5) \lor$   
 $(OR_5_3(X_1, ... X_5) \land OR_2_1(X_6, X_7)) \lor$   
 $(OR_5_2(X_1, ..., X_5) \land OR_2_2(X_6, X_7)),$   
 $OR_7_6(X_1, ..., X_7) =$   
 $(OR_5_5(X_1, ... X_5) \land OR_2_1(X_6, X_7)) \lor$   
 $(OR_5_4(X_1, ... X_5) \land OR_2_2(X_6, X_7)).$ 

Thus, it is advantageous to split variables and reuse smaller OR\_n\_k functions in a parallel counter. For instance, an implementation of a parallel counter based on partitioning seven inputs into groups of two and five is in figure 12.

Similarly, one can partition seven inputs into groups of four and three. An implementation of the parallel counter based on this partition is in figure 13. One uses the following logic formulas in this implementation.

$$OR_{-}7_{-}2(X_{1}, ..., X_{7}) = OR_{-}4_{-}2(X_{1}, X_{2}, X_{3}, X_{4}) \vee$$

$$(OR_{-}4_{-}1(X_{1}, X_{2}, X_{3}, X_{4}) \wedge OR_{-}3_{-}1(X_{5}, X_{6}, X_{7})) \vee OR_{-}3_{-}2(X_{5}, X_{6}, X_{7}),$$

$$OR_{-}7_{-}4(X_{1}, ..., X_{7}) = OR_{-}4_{-}4(X_{1}, X_{2}, X_{3}, X_{4}) \vee$$

$$(OR_{-}4_{-}3(X_{1}, X_{2}, X_{3}, X_{4}) \wedge OR_{-}3_{-}1(X_{5}, X_{6}, X_{7})) \vee$$

$$(OR_{-}4_{-}2(X_{1}, X_{2}, X_{3}, X_{4}) \wedge OR_{-}3_{-}2(X_{5}, X_{6}, X_{7})) \vee$$

$$(OR_{-}4_{-}1(X_{1}, X_{2}, X_{3}, X_{4}) \wedge OR_{-}3_{-}3(X_{5}, X_{6}, X_{7})),$$

$$OR_{-}7_{-}6(X_{1}, ..., X_{7}) =$$

$$(OR_{-}4_{-}4(X_{1}, X_{2}, X_{3}, X_{4}) \wedge OR_{-}3_{-}2(X_{5}, X_{6}, X_{7})) \vee$$

$$(OR_{-}4_{-}3(X_{1}, X_{2}, X_{3}, X_{4}) \wedge OR_{-}3_{-}3(X_{5}, X_{6}, X_{7})).$$

One needs a method to choose between the implementations in figures 12 and 13. Here is a pneumonic rule for making a choice. If one or two inputs arrive essentially later then one should use the implementation on figure 12 based on partition 7=5+2. Otherwise, the implementation on figure 13 based on partition 7=4+3 is probably optimal.

Parallel counters with 6, 5, and 4 inputs can be implemented according to the logic for the seven input parallel counter. Reducing the number of inputs decreases the area significantly and increases the speed slightly. It is advantageous to implement a six input parallel counter using partitions of 6, 3 + 3 or 4 + 2.

A preferred embodiment of the present invention will now be described with reference to figures 14 to 21.

Although it is possible to implement any OR\_n\_k or EXOR\_n\_k function in two levels of logic, the fan-out of each input is very high and the fan-in of the OR gate is also very high. It is known that both high fan-out and high fan-in contribute significantly to the delay of the circuit. It is often required that more than one OR\_n\_k or EXOR\_n\_k

function be computed from the same inputs. A two level implementation does not allow sharing of logic thus resulting in high area.

This embodiment of the present invention uses the binary tree splitting of the inputs and the logic to reduce fan-out and enable reuse of logic. Figure 14 illustrates schematically the organisation of the logic. At a first level 8 elementary logic blocks 1 are used each having two of the binary inputs and providing 2 outputs. The elementary logic blocks 1 of the first level perform elementary symmetric functions. These can either be exclusive OR symmetric functions or OR symmetric functions. At the second level four secondary logic blocks 2 each use the logic of two elementary logic blocks 1 and hence have four inputs and four outputs. The secondary logic blocks 2 perform larger symmetric functions. At the third level two tertiary logic blocks 3 each use the logic of two secondary logic blocks 2 and hence have eight inputs and eight outputs. The tertiary logic blocks 3 perform larger symmetric functions. At the fourth level the parallel counter 4 uses the logic of two tertiary logic blocks 3 and hence has sixteen inputs and sixteen outputs.

As can be seen in figure 14, the binary tree arrangement of the logic enables the logic for performing smaller symmetric functions to be used for the parallel counter. Also the arrangement provides for significant logic sharing. This significantly reduces fan-out.

As will be described in more detail, it is also possible to provide further logic sharing by using the elementary symmetric function logic for combining outputs of previous logic blocks in the binary tree.

The functions OR\_16\_8, OR\_16\_4 and OR\_16\_12 are constructed from the set of inputs X<sub>1</sub>, X<sub>2</sub>...X<sub>16</sub>. Although, the embodiment is described with OR\_n\_k functions the same construction applies to EXOR\_n\_k functions after replacing every OR gate by an EXOR gate.

The principles behind this embodiment of the invention will now be described. The function OR\_(r+s)\_t can be computed as the OR of the functions OR\_r\_k ^OR\_s\_t-k as t runs through 0,1,2...k,

$$OR_{(r+s)_t(X_1...X_{r+s})} = \bigvee_{k=0}^{t} [OR_{r_k(X_1...X_r)} \land OR_{s_k(t-k)(X_{r+1}...X_{r+s})}].$$

In an embodiment with 16 inputs, at a first level the 16 inputs are divided into 8 subsets  $-\{X_1,X_2\}$ ,  $\{X_3,X_4\}$ ,..., $\{X_{15},X_{16}\}$ , each subset containing two inputs. For each subset a logic block 1 that computes  $OR_2_1$  and  $OR_2_2$  is constructed. The 8 blocks form the first level of the tree. Since each input fans out into an OR gate and an AND gate we see that each input has a fan-out of two. Also the first layer is very regular consisting of 8 identical blocks. The logic block 1 for computing the symmetric functions  $OR_2_1$  and  $OR_2_2$  is illustrated in figure 15.

At a second level, 4 logic blocks 2 are formed by combining outputs from two adjacent logic blocks 1 at level one. These 4 blocks comprise the second layer of the tree. Each block has as inputs the outputs of two adjacent blocks from level one. The inputs are combined to form the functions OR\_4\_1, OR\_4\_2, OR\_4\_3, OR\_4\_4. The logic block 2 for computing these symmetric functions is illustrated in figure 16. The indices 1 and 2 are used in the equations below to distinguish functions formed on different subsets of the set of inputs. The symmetric functions can be represented as:

$$OR_{-4} = [OR_{-2}]_{1} \lor [OR_{-2}]_{2},$$

$$OR_{-4} = ([OR_{-2}]_{1}) \land [OR_{-2}]_{2}) \lor ([OR_{-2}]_{1} + [OR_{-2}]_{2}),$$

$$OR_{-4} = ([OR_{-2}]_{1}) \land [OR_{-2}]_{2}) \lor ([OR_{-2}]_{1}) \land [OR_{-2}]_{2},$$

$$OR_{-4} = [OR_{-2}]_{1} \land [OR_{-2}]_{2}.$$

At a third level, 2 logic blocks 3 are formed by combining outputs from two adjacent logic blocks 2 at level two. These 2 blocks comprise the third layer of the tree. Each block has as inputs the outputs of two adjacent blocks from level two. The inputs are combined to form the functions OR\_8\_1, OR\_8\_2, OR\_8\_3, OR\_8\_4, OR\_8\_5, OR\_8\_6, OR\_8\_7 and OR\_8\_8. The logic block 3 for computing these symmetric functions is illustrated in figure 17. The symmetric functions can be represented as:

$$OR_{-}8_{-}1 = [OR_{-}4_{-}1]_{1} \lor [OR_{-}4_{-}1]_{2},$$

$$OR_{-}8_{-}2 = ([OR_{-}4_{-}1]_{1} \land [OR_{-}4_{-}1]_{2}) \lor [OR_{-}4_{-}2]_{1} \lor [OR_{-}4_{-}2]_{2},$$

$$OR_{-}8_{-}3 = ([OR_{-}4_{-}1]_{1} \land [OR_{-}4_{-}2]_{2}) \lor$$

$$([OR_{-}4_{-}2]_{1} \land [OR_{-}4_{-}1]_{2}) \lor [OR_{-}4_{-}3]_{1} \lor [OR_{-}4_{-}3]_{2},$$

$$OR_{-}8_{-}4 = ([OR_{-}4_{-}1]_{1} \land [OR_{-}4_{-}3]_{2}) \lor ([OR_{-}4_{-}2]_{1} \land [OR_{-}4_{-}2]_{2}) \lor$$

$$([OR_{-}4_{-}3]_{1} \land [OR_{-}4_{-}4]_{2}) \lor ([OR_{-}4_{-}4]_{1} \lor [OR_{-}4_{-}4]_{2}),$$

$$OR_{-}8_{-}5 = ([OR_{-}4_{-}2]_{1} \land [OR_{-}4_{-}4]_{2}) \lor ([OR_{-}4_{-}4]_{1} \land [OR_{-}4_{-}2]_{2}),$$

$$OR_{-}8_{-}6 = ([OR_{-}4_{-}2]_{1} \land [OR_{-}4_{-}4]_{2}) \lor ([OR_{-}4_{-}4]_{1} \land [OR_{-}4_{-}2]_{2}),$$

$$OR_{-}8_{-}6 = ([OR_{-}4_{-}3]_{2}) \lor ([OR_{-}4_{-}4]_{1} \land [OR_{-}4_{-}2]_{2}),$$

$$OR_{-}8_{-}7 = ([OR_{-}4_{-}3]_{1} \land [OR_{-}4_{-}4]_{2}) \lor ([OR_{-}4_{-}4]_{1} \land [OR_{-}4_{-}3]_{2}),$$

$$OR_{-}8_{-}8 = [OR_{-}4_{-}4]_{1} \land [OR_{-}4_{-}4]_{1} \land [OR_{-}4_{-}3]_{2}),$$

$$OR_{-}8_{-}8 = [OR_{-}4_{-}4]_{1} \land [OR_{-}4_{-}4]_{1} \land [OR_{-}4_{-}3]_{2}),$$

$$OR_{-}8_{-}8 = [OR_{-}4_{-}4]_{1} \land [OR_{-}4_{-}4]_{1} \land [OR_{-}4_{-}3]_{2}),$$

At the final level, 3 outputs are formed by combining outputs from the two adjacent logic blocks 3 at level 3. This logic comprises the third layer of the tree. Outputs of the two adjacent blocks from level three are combined to form the functions OR\_16\_8, OR\_16\_4, and OR\_16\_12. The logic block 4 for computing these symmetric functions is illustrated in figure 18. The symmetric functions can be represented as:

$$OR_16_4 = ([OR_8_1]_1 \land [OR_8_3]_2) \lor ([OR_8_2]_1 \land [OR_8_2]_2) \lor \\ ([OR_8_3]_1 \land [OR_8_1]_2) \lor [OR_8_4]_1 \lor [OR_8_4]_2, \\ OR_16_8 = ([OR_8_1]_1 \land [OR_8_7]_2) \lor ([OR_8_2]_1 \land [OR_8_6]_2) \lor \\ ([OR_8_3]_1 \land [OR_8_5]_2) \lor ([OR_8_4]_1 \land [OR_8_4]_2) \lor ([OR_8_5]_1 \land [OR_8_3]_2) \lor \\ ([OR_8_6]_1 \land [OR_8_2]_2) \lor ([OR_8_7]_1 \land [OR_8_1]_2) \lor [OR_8_8]_1 \lor [OR_8_8]_2, \\ OR_16_12 = ([OR_8_4]_1 \land [OR_8_8]_2) \lor ([OR_8_5]_1 \land [OR_8_7]_2) \lor \\ ([OR_8_6]_1 \land [OR_8_6]_2) \lor ([OR_8_7]_1 \land [OR_8_5]_2) \lor ([OR_8_8]_1 \land [OR_8_4]_2).$$

Whilst it is possible in accordance with the invention to generate all of the outputs of the parallel counter using the outputs of the logic blocks 3, it is advantageous to determine the two least significant bits separately in parallel. This is illustrated in figures 19 and

20. Although this increases fan-out slightly, it decreases the depth of the tree thus increases the speed of the circuit.

Figure 19 is a diagram of a logic block 5 for determining the symmetric functions EXOR\_4\_2 and EXOR\_4\_1. In the determination of EXOR\_4\_2 the faster OR gate replaces an EXOR gate, according to:

EXOR\_4\_2 = ([OR\_2\_1]<sub>1</sub>^[OR\_2\_1]<sub>2</sub>)
$$\oplus$$
[OR\_2\_2]<sub>1</sub> $\oplus$ [OR\_2\_2]<sub>2</sub>=  
= ([OR\_2\_1]<sub>1</sub>^[OR\_2\_1]<sub>2</sub>) \leftarrow ([OR\_2\_2]<sub>1</sub> $\oplus$ [OR\_2\_2]<sub>2</sub>).

Four of these logic blocks 5 are provided to take the 16 inputs. Thus the logic block 5 can be considered to be a combined level 1 and 2 implementation.

Figure 20 is a diagram of a logic block 6 for determining the symmetric functions EXOR\_15\_2 and EXOR\_15\_1 which comprise the least two significant bits output from the parallel counter of this embodiment. This logic block comprises level 3 in the binary tree and it uses four of the logic blocks 5. Thus even in this parallel determination of the least significant two bits, there is reuse of logic using the binary tree structure.

Figure 21 is a diagram of the parallel converter of this embodiment of the invention in which the logic of block 4 is used to determine the most significant bits and the logic of block 6 is used to determine the least significant bits.

In the logic blocks illustrated in figures 16, 17 and 18, it can be seen that in addition to sharing logic for the inputs, the outputs of the elementary logic blocks, the secondary logic blocks and the tertiary logic blocks are input into elementary logic blocks thus providing further logic sharing. The reason for this is that OR functions are not independent. Assuming that k≥s,

$$OR_n_k \wedge OR_n_s = OR_n_k$$
,  
 $OR_n_k \vee OR_n_s = OR_n_s$ .

These formulas result in significant reductions in logic for parallel counter. The first instance of such a reduction is the following formula for the second most significant bit of a parallel counter,

$$S_{t-1} = OR\_n\_(2^t + 2^{t-1}) \vee [\neg OR\_n\_2^t \wedge OR\_n\_2^{t-1}].$$

To show the second instance of such a reduction, it is assumed that k≥s,

$$([OR_n_k]_1 \land [OR_m_s]_2) \lor ([OR_m_s]_1 \land [OR_n_k]_2) = [OR_m_s]_1 \land [OR_m_s]_2 \land ([OR_n_k]_1 \lor [OR_n_k]_2).$$

These formulas allow the reduction of fan-out by sharing certain logic. As shown on block 2, the functions OR\_4\_2 and OR\_4\_3 are implemented by three levels of shared logic,

$$OR_{-4} = [OR_{-2}_{-1}]_{1} \lor [OR_{-2}_{-1}]_{2},$$

$$OR_{-4}_{-2} = ([OR_{-2}_{-1}]_{1} \land [OR_{-2}_{-1}]_{2}) \lor [OR_{-2}_{-2}]_{1} \lor [OR_{-2}_{-2}]_{2},$$

$$OR_{-4}_{-3} = [OR_{-2}_{-1}]_{1} \land [OR_{-2}_{-1}]_{2} \land ([OR_{-2}_{-2}]_{1} \lor [OR_{-2}_{-2}]_{2}),$$

$$OR_{-4}_{-4} = [OR_{-2}_{-2}]_{1} \land [OR_{-2}_{-2}]_{2}.$$

Block 3 is a circuit implementing logic of level three. The reductions afford the following expressions for functions OR\_8\_1, OR\_8\_2, OR\_8\_3, OR\_8\_4, OR\_8\_5, OR\_8\_6, OR\_8\_7, and OR\_8\_8,

$$OR_{-}8_{-}1 = [OR_{-}4_{-}1]_{1} \lor [OR_{-}4_{-}1]_{2},$$

$$OR_{-}8_{-}2 = ([OR_{-}4_{-}1]_{1} \land [OR_{-}4_{-}1]_{2}) \lor ([OR_{-}4_{-}2]_{1} \lor [OR_{-}4_{-}2]_{2}),$$

$$OR_{-}8_{-}3 = [([OR_{-}4_{-}1]_{1} \land [OR_{-}4_{-}1]_{2}) \land$$

$$([OR_{-}4_{-}2]_{1} \lor [OR_{-}4_{-}2]_{2})] \lor [OR_{-}4_{-}3]_{1} \lor [OR_{-}4_{-}3]_{2},$$

$$OR_{-}8_{-}4 = [([OR_{-}4_{-}1]_{1} \land [OR_{-}4_{-}1]_{2}) \land ([OR_{-}4_{-}3]_{1} \lor [OR_{-}4_{-}3]_{2})] \lor$$

$$([OR_{-}4_{-}2]_{1} \land [OR_{-}4_{-}2]_{2}) \lor [OR_{-}4_{-}4]_{1} \lor [OR_{-}4_{-}4]_{2},$$

$$OR_{-}8_{-}5 = [([OR_{-}4_{-}1]_{1} \land [OR_{-}4_{-}1]_{2}) \land ([OR_{-}4_{-}4]_{1} \lor [OR_{-}4_{-}4]_{2})] \lor$$

$$[([OR_{-}4_{-}2]_{1} \land [OR_{-}4_{-}2]_{2}) \land ([OR_{-}4_{-}3]_{1} \lor [OR_{-}4_{-}3]_{2}),$$

$$OR_{-}8_{-}6 = [([OR_{-}4_{-}2]_{1} \land [OR_{-}4_{-}2]_{2}) \land ([OR_{-}4_{-}3]_{2}),$$

$$OR_{-}8_{-}7 = ([OR_{-}4_{-}3]_{1} \land [OR_{-}4_{-}3]_{2}),$$

$$OR_{-}8_{-}7 = ([OR_{-}4_{-}3]_{1} \land [OR_{-}4_{-}3]_{2}),$$

$$([OR_{-}4_{-}4]_{1} \lor [OR_{-}4_{-}4]_{2}),$$

$$OR_{-}8_{-}8 = [OR_{-}4_{-}4]_{1} \land [OR_{-}4_{-}4]_{2}.$$

Block 4 is a circuit implementing logic for level 4. The implementation of functions OR\_16\_8, OR\_16\_4, and OR\_16\_12 follows reduced formulas,

$$\begin{split} \text{OR}_{-1}6\_4 &= \left[ \left( \left[ \text{OR}_{-8}_{-1} \right]_{1} \right) \left( \left[ \text{OR}_{-8}_{-3} \right]_{1} \vee \left[ \text{OR}_{-8}_{-3} \right]_{2} \right) \right] \vee \\ & \quad \left( \left[ \text{OR}_{-8}_{-2} \right]_{1} \right) \left( \left[ \text{OR}_{-8}_{-4} \right]_{1} \vee \left[ \text{OR}_{-8}_{-4} \right]_{2} \right) \\ & \quad \left( \left[ \text{OR}_{-8}_{-1} \right]_{1} \right) \left( \left[ \text{OR}_{-8}_{-7} \right]_{1} \vee \left[ \text{OR}_{-8}_{-7} \right]_{2} \right) \right] \vee \\ & \quad \left[ \left( \left[ \text{OR}_{-8}_{-2} \right]_{1} \right) \left( \left[ \text{OR}_{-8}_{-2} \right]_{2} \right) \left( \left[ \text{OR}_{-8}_{-6} \right]_{1} \vee \left[ \text{OR}_{-8}_{-6} \right]_{2} \right) \right] \vee \\ & \quad \left[ \left( \left[ \text{OR}_{-8}_{-3} \right]_{1} \right) \left( \left[ \text{OR}_{-8}_{-3} \right]_{2} \right) \left( \left[ \text{OR}_{-8}_{-5} \right]_{2} \right) \right] \vee \\ & \quad \left( \left[ \text{OR}_{-8}_{-4} \right]_{1} \right) \left( \left[ \text{OR}_{-8}_{-4} \right]_{2} \right) \wedge \left( \left[ \text{OR}_{-8}_{-8} \right]_{1} \vee \left[ \text{OR}_{-8}_{-8} \right]_{2} \right) \\ & \quad \left( \left[ \text{OR}_{-8}_{-6} \right]_{2} \right) \vee \left( \left[ \text{OR}_{-8}_{-5} \right]_{1} \right) \left( \left[ \text{OR}_{-8}_{-5} \right]_{2} \right) \wedge \left( \left[ \text{OR}_{-8}_{-7} \right]_{1} \vee \left[ \text{OR}_{-8}_{-7} \right]_{2} \right) \right] . \end{split}$$

The binary tree principle of this embodiment of the present invention can be implemented using either OR or EXOR symmetric functions. When using EXOR symmetric functions there is a reduction in logic which applies. Assume that  $k = \sum_{i \in S} 2^i$  where S is a set of natural numbers uniquely determined by k as a set of positions of ones in the binary representation of k. Then

$$EXOR_n_k = AND_{i \in S} EXOR_n_2^i$$
.

Thus, designing a circuit computing EXOR\_n\_k, one gets away with computing only functions EXOR\_n\_2<sup>i</sup> on subsets and thus although EXOR logic is slower, there is less fan-out than when using OR logic.

As can be seen in figure 21, the most efficient circuit can comprise a mixture of OR and EXOR symmetric function logic circuits.

During the design of the parallel counter there is also a further possibility to save logic. There is a useful formula,

$$OR_n_k(X_1...X_n) = \neg OR_n_(n+1-k)(\neg X_1...\neg X_n).$$

Thus if a library contains a fast module generating OR\_4\_3 then this module can be used with inverters to generate OR\_4\_2. The opposite observation holds as well: an OR 4 2 module enables the generation of OR\_4\_3.

Further reductions can be applied to logic for a parallel counter based on OR elementary symmetric functions. For instance, the third significant bit admits the expression

$$S_{t-2} = OR_n_{(2^{t}+2^{t-1}+2^{t-2})} \vee [\neg OR_n_{(2^{t}+2^{t-1})} \wedge OR_n_{(2^{t}+2^{t-2})}] \vee [\neg OR_n_{(2^{t}+2^{t-2})}] \vee [\neg OR_n_{(2^{t-1}+2^{t-2})}] \vee [\neg OR_n_{(2^{t-1}+2^{t-2})}$$

Another important application of reductions is logic for a conditional parallel counter. A conditional parallel counter is a module with n inputs. Let Q be a subset of  $\{0,1...n\}$ . The subset determines a condition. The module produces the binary representation of the number of high inputs if this number of high inputs belongs to Q. If the number of high inputs does not belong to Q, the outputs can be any logical function. Such a module can replace a parallel counter if the number of high inputs is in Q.

A useful conditional parallel counter has  $Q=\{0,1...m\}$  for some  $m \le n$ . Logic for such a counter can be obtained from logic for a parallel counter with m inputs by replacing every  $OR_m_k$  with  $OR_n_k$ . For instance, if  $Q=\{0,1,2,3\}$  then a conditional parallel counter has 2 outputs  $S_1$ ,  $S_0$  given by

$$S_1 = OR_n_2, S_0 = EXOR_n_1.$$

Another instance of a conditional parallel counter has  $Q=\{0,1,2,3,4,5\}$ ,

$$S_2 = OR \ n \ 4$$
,  $S_1 = \neg OR \ n \ 4 \land OR \ n \ 2$ ,  $S_0 = EXOR \ n \ 1$ .

If the number of high inputs for one of these two counters does not belong to Q then the output is the binary representation of the greatest element of Q, i.e., 3=11 or 5=101.

An important application of conditional parallel counters is constant multipliers. A constant multiplier is a module whose inputs form binary representations of two numbers A, B, and outputs comprise the binary representation of the product A\*B whenever A is a number that belongs to a set of allowed constants. Since constant multipliers are smaller and faster then multipliers, it is beneficial to use them whenever

one can choose one multiplicand from the set of allowed constants. One can do it, for instance, designing a digital filter.

Another aspect of the present invention comprises a technique for multiplication and this will be described hereinafter.

Multiplication is a fundamental operation in digital circuits. Given two n-digit binary numbers

$$A_{n-1}2^{n-1}+A_{n-2}2^{n-2}+\ldots+A_12+A_0 \text{ and } B_{n-1}2^{n-1}+B_{n-2}2^{n-2}+\ldots+B_12+B_0,$$
 since the state of

their product

$$P_{2n-1}2^{2n-1}+P_{2n-2}2^{2n-2}+...+P_12+P_0$$

has up to 2n digits. Logical circuits generating all P<sub>i</sub> as outputs generally follow the scheme in figure 14. Wallace has invented the first fast architecture for a multiplier, now called the Wallace-tree multiplier (Wallace, C. S., A Suggestion for a Fast Multiplier, IEEE Trans. Electron. Comput. EC-13: 14-17 (1964)). Dadda has investigated bit behaviour in a multiplier (L. Dadda, Some Schemes for Parallel Multipliers, Alta Freq 34: 349-356 (1965)). He has constructed a variety of multipliers and most multipliers follow Dadda's scheme.

Dadda's multiplier uses the scheme in on figure 22. If inputs have 8 bits then 64 parallel AND gates generate an array shown in figure 23. The AND gate sign  $\wedge$  is omitted for clarity so that  $A_i \wedge B_j$  becomes  $A_i B_j$ . The rest of figure 23 illustrates array reduction that involves full adders (FA) and half adders (HA). Bits from the same column are added by half adders or full adders. Some groups of bits fed into a full adder are in rectangles. Some groups of bits fed into a half adder are in ovals. The result of array reduction is just two binary numbers to be added at the last step. One adds these two numbers by one of fast addition schemes, for instance, conditional adder or carry-look-ahead adder.

This aspect of the present invention comprises two preferred steps: array deformation and array reduction using the parallel counter with the accordance with the first aspect of the present invention.

The process of array deformation will now be described.

Some parts of the multiplication array, formed by  $A_iB_j$  such as on figure 23, have interesting properties. One can write simple formulas for the sum of the bits in these parts. Examples of such special parts are on figure 24. In general, chose an integer k, and those  $A_iB_j$  in the array such that the absolute value of i-j-k is less or equal to 1 comprise a special part.

Let S<sub>i</sub> be the bits of the sum of all the bits of the form A<sub>i</sub>B<sub>i</sub> shown on figure 1. Then

$$S_0 = A_0 \wedge B_0,$$

$$S_1 = (A_1 \wedge B_0) \oplus (A_0 \wedge B_1),$$

$$S_2 = (A_1 \wedge B_1) \oplus (A_1 \wedge B_1 \wedge A_0 \wedge B_0),$$

$$S_{2k+1} = (A_{k+1} \wedge B_k) \oplus (A_k \wedge B_{k+1}) \oplus (A_k \wedge B_k \wedge A_{k-1} \wedge B_{k-1})$$
for all  $k > 0$ ,
$$S_{2k} = (A_k \wedge B_k) \oplus (A_{k-1} \wedge B_{k-1} \wedge (A_{k+1} \vee B_{k+1})))$$
for all  $k > 1$ .

These formulas show that the logic for summing the chosen entries in the array does not get large. Whereas if random numbers were summed the logic for the  $(n + 1)^{th}$  bit is larger than the logic for the  $n^{th}$  bit.

Using these formulas, one can generate a different array. The shape of array changes. This is why it is called array deformation. These formulas are important because one can speed up a multiplication circuit by generating an array of a particular shape.

The array in figure 25 is for an 8-bit multiplication. The AND gate sign  $\wedge$  is omitted for clarity so that  $A_i \wedge B_j$  becomes  $A_i B_j$ . Array deformation logic generates X, Y, and Z:

$$X = (A_1 \wedge B_6) \oplus (A_0 \wedge B_7),$$

$$Y = A_1 \wedge B_7 \wedge \neg (A_0 \wedge B_6),$$

$$Z = A_1 \wedge B_7 \wedge A_0 \wedge B_6.$$

The advantage of this array over one in figure 23 is that the maximal number of bits in a column is smaller. The array in figure 23 has a column with 8 bits. The array on figure 25 has 4 columns with 7 bits but none with 8 or more bits. The logic for the generation of X Y and Z is illustrated in figure 26. This logic can be used in parallel with the first two full adders (illustrated in Figure 2) in the array reduction step thus avoiding delays caused by additional logic.

Array reduction is illustrated in figure 25. The first step utilizes 1 half adder, 3 full adders, 1 parallel counter with 4 inputs, 2 parallel counters with 5 inputs, 1 parallel counters with 6 inputs, and 4 parallel counters with 7 inputs. The three parallel counters (in columns 7, 8, and 9) have an implementation based on 7=5+2 partition. The bits X, Y, and Z join the group of two in the partition. The counter in column 6 is implemented on 7=4+3 partition. The counter in column 5 is based on 6=3+3 partition. The remaining counters should not be partitioned. The locations of full adders are indicated by ovals. The half adder is shown by a rectangle.

An adder for adding the final two binary numbers is designed based on arrival time of bits in two numbers. This gives a slight advantage but it is based on common knowledge, that is conditional adder and ripple-carry adder.

Although in this embodiment the addition of two 8 bit numbers has been illustrated, the invention is applicable to any N bit binary number addition. For example for 16 bit addition, the array reduction will reduce the middle column height from 16 to 15 thus allowing two seven bit full adders to be used for the first layer to generate two 3 bit outputs and the left over input can be used with the other two 3 outputs as an input to a further seven input full adder thus allowing the addition of the 16 bits in only two layers.

This aspect of the present invention can be used with the parallel counter of the first aspects of the present invention to provide a fast circuit.

The parallel counter of the first aspects of the present invention has other applications, other than used in the multiplier of one aspect of the present invention. It can be used in RSA and reduced area multipliers. Sometimes, it is practical to build just a fragment of the multiplier. This can happen when the array is too large, for instance in RSA algorithms where multiplicands may have more than more than 1000 bits. This fragment of a multiplier is then used repeatedly to reduce the array. In current implementations, it consists of a collection of full adders. One can use 7 input parallel counters followed by full adders instead.

A parallel counter can also be used in circuits for error correction codes. One can use a parallel counter to produce Hamming distance. This distance is useful in digital communication. In particular the Hamming distance has to be computed in certain types of decoders, for instance, the Viterbi decoder or majority-logic decoder.

Given two binary messages  $(A_1, A_2, ... A_n)$  and  $(B_1, B_2, ... B_n)$ , the Hamming distance between them is the number of indices i between 1 and n such that  $A_i$  and  $B_i$  are different. This distance can be computed by a parallel counter whose n inputs are

$$(A_1 \oplus B_1, A_2 \oplus B_2, ... A_n \oplus B_n).$$

The multiply-and-add operation is fundamental in digital electronics because it includes filtering. Given 2n binary numbers  $X_1, X_2, ... X_n, Y_1, Y_2, ... Y_n$ , the result of this operation is

$$X_1Y_1 + X_2Y_2 + ... + X_nY_n$$

One can use the multiplier described to implement multiply-and-add in hardware. Another strategy can be to use the scheme in figure 22. All partial products in products  $X_iY_i$  generate an array. Then one uses the parallel counter X to reduce the array.

In the present invention, one can use the parallel counter whenever there is a need to add an array of numbers. For instance, multiplying negative number in two-complement form, one generates a different array by either Booth recording (A. D. Booth, A Signed

Binary Multiplication Technique, Q. J. Mech. Appl. Math. 4: 236-240 (1951)) or another method. To obtain a product one adds this array of numbers.

#### **CLAIMS:**

- 1. A parallel counter comprising:
  - a plurality of inputs for receiving a binary number as a plurality of binary inputs;
- a plurality of outputs for outputting binary code indicating the number of binary ones in the plurality of binary inputs; and
- a logic circuit connected between the plurality of inputs and the plurality of outputs and for generating each of the plurality of binary outputs as a symmetric function of the binary inputs;

wherein said logic circuit is divided into a plurality of logic units, each logic unit is arranged to generate logic unit binary outputs as a symmetric function of the binary inputs to the logic unit, the binary inputs of said plurality of inputs are divided into inputs into a plurality of said logic units, and the binary outputs of said plurality of outputs are generated using binary outputs of a plurality of said logic units.

- 2. A parallel counter according to claim 1, wherein the binary inputs of said plurality of inputs are divided according to a binary tree into inputs into a plurality of said logic units.
- 3. A parallel counter according to claim 1 or claim 2, wherein said logic units are arranged to receive 2<sup>n</sup> of said binary inputs, where n is an integer indicating the level of the logic units in the binary tree, said logic circuit has m logic units at each level, where m is a rounded up integer determined from (the number of binary inputs)/2<sup>n</sup>, logic units having a higher level in the binary tree comprise logic of logic units at lower levels in the binary tree, and each logic unit is arranged to generate logic unit binary outputs as a symmetric function of the binary inputs to the logic unit.
- 4. A parallel counter according to claim 3, wherein each logic unit at the first level is arranged to generate logic unit binary outputs as an elementary symmetric function of the binary inputs to said logic circuit.

- 5. A parallel counter according to claim 4, wherein each logic unit at the first level is arranged to generate logic unit binary outputs as a symmetric function of the binary inputs to said logic circuit using OR logic for combining the binary inputs.
- 6. A parallel counter according to claim 5 wherein each logic unit at the first level is arranged to logically AND each of the binary inputs to the logic unit and to logically OR each of the binary inputs to the logic unit to generate the logic unit binary outputs.
- 7. A parallel counter according to claim 4, wherein each logic unit at the first level is arranged to generate logic unit binary outputs as a symmetric function of the binary inputs to said logic circuit using exclusive OR logic for combining the binary inputs.
- 8. A parallel counter according to claim 7 wherein each logic unit at the first level is arranged to logically AND each of the binary inputs to the logic unit and to logically exclusively OR each of the binary inputs to the logic unit to generate the logic unit binary outputs.
- 9. A parallel counter according to any one of claims 3 to 8, wherein elementary logic units are provided as the logic units at the first level for performing elementary symmetric functions, outputs from each of two primary elementary logic units receiving four logically adjacent binary inputs from said plurality of inputs are input to two secondary elementary logic units, an output from each of the secondary elementary logic units is input to a tertiary elementary logic unit, and said primary, secondary and tertiary elementary logic units form a secondary logic unit at a second level of the binary tree having a binary output comprising a binary output from each of said secondary elementary logic units and two binary outputs from said tertiary elementary logic unit.
- 10. A parallel counter according to claim 9, wherein tertiary logic units at a third level of the binary tree each comprise two secondary logic units receiving eight logically adjacent binary inputs from said plurality of inputs, four elementary logic units receiving as inputs the outputs of said two secondary logic units, and further logic for generating

binary outputs as a symmetric function of the binary inputs to said tertiary logic unit using the binary outputs of said four elementary logic units.

- 11. A parallel counter according to claim 10, wherein quaternary logic units at a fourth level of the binary tree each comprise two tertiary logic units receiving sixteen logically adjacent binary inputs from said plurality of inputs, four elementary logic units receiving as inputs the outputs of said two tertiary logic units, and further logic for generating binary outputs as a symmetric function of the binary inputs to said quaternary logic unit using the binary outputs of said four elementary logic units
- 12. A parallel counter according to any one of claims 3 to 8, wherein elementary logic units are provided as the logic units at the first level for performing elementary symmetric functions, and logic units for higher levels are comprised of logic units of lower levels.
- 13. A parallel counter according to claim 12, wherein said logic units for higher levels above the second level comprise logic units of an immediately preceding level and elementary logic units.
- 14. A parallel counter according to any one of claims 3 to 13, wherein each logic unit at each level is arranged to generate logic unit binary outputs as a symmetric function of the binary inputs to said logic circuit using OR logic for combining the binary inputs.
- 15. A parallel counter according to any one of claims 3 to 13, wherein each logic unit at each level is arranged to generate logic unit binary outputs as a symmetric function of the binary inputs to said logic circuit using exclusive OR logic for combining the binary inputs.
- 16. A logic circuit for multiplying two binary numbers comprising:

array generation logic for generating an array of binary numbers comprising all possible combinations of each bit of each binary number;

array reduction logic including at least one parallel counter according to any preceding claim for reducing the number of combinations in the array; and binary addition logic for adding the reduced combinations to generate an output.

17. A method of designing a logic circuit comprising: providing a library of logic module designs each for performing a small symmetric function;

designing a logic circuit to perform a large symmetric function;

identifying small symmetric functions which can perform said symmetric function; selecting logic modules from said library to perform said small symmetric functions; identifying a logic circuit in the selected logic circuit which performs a symmetric function and which can be used to perform another symmetric function; selecting the logic circuit corresponding to the identified symmetric function and using

selecting the logic circuit corresponding to the identified symmetric function and using the selected logic circuit with inverters to perform said other symmetric function using the relationship between the symmetric functions:

$$OR_n_k(X_1...X_n) = \neg OR_n_(n+1-k)(\neg X_1...\neg X_n)$$

where  $\neg$  denotes an inversion, n is the number of inputs, and k is the number of sets of inputs AND combined together.

- 18. A conditional parallel counter having m possible high inputs out of n inputs, where m < n, and n and m are integers, the counter comprising the parallel counter according to any one of claims 1 to 15 for counting inputs to generate p outputs for m inputs, wherein the number n of inputs to the counter is greater than  $2^p$ .
- 19. A constant multiplier comprising the conditional parallel counter according to claim 18.
- 20. A digital filter comprising a constant multiplier according to claim 19.

21. A parallel counter comprising:

a plurality of inputs for receiving a binary number as a plurality of binary inputs; a plurality of outputs for outputting binary code indicating the number of binary

ones in the plurality of binary inputs; and

a logic circuit connected between the plurality of inputs and the plurality of outputs and for generating each of the plurality of binary outputs as a symmetric function of the binary inputs.

- 22. A parallel counter according to claim 21 wherein said logic circuit is arranged to generate at least one of the binary outputs as a symmetric function of the binary inputs using exclusive OR logic for combining a plurality of sets of one or more binary inputs.
- 23. A parallel counter according to claim 22 wherein said logic circuit is arranged to logically AND members of each set of binary inputs and to logically exclusively OR the result of the AND operations.
- 24. A parallel counter according to claim 23 wherein said logic circuit is arranged to logically AND 2<sup>i</sup> of the binary inputs in each set for the generation of the i<sup>th</sup> binary output, where i is an integer from 1 to N, N is the number of binary outputs and i represents the significance of each binary output, each set being unique and the sets covering all possible combinations of binary inputs.
- 25. A parallel counter according to claim 23 wherein said logic circuit is arranged to logically AND members of each set of binary inputs, where each set is unique and the sets cover all possible combinations of binary inputs.
- 26. A parallel counter according to any one of claims 21 to 25 wherein said logic circuit is arranged to generate at least one of the binary outputs as a symmetric function of the binary inputs using OR logic for combining a plurality of sets of one or more binary inputs.

- 27. A parallel counter according to claim 26 wherein said logic circuit is arranged to logically AND members of each set of binary inputs and to logically OR the result of the AND operations.
- 28. A parallel counter according to claim 27 wherein said logic circuit is arranged to logically AND 2<sup>N-1</sup> of the binary inputs in each set for the generation of the N<sup>th</sup> binary output, where N is the number of binary outputs and the N<sup>th</sup> binary output is the most significant, each set being unique and the sets covering all possible combinations of binary inputs.
- 29. A parallel counter according to claim 27 wherein said logic circuit is arranged to logically AND members of each set of binary inputs, where each set is unique and the sets cover all possible combinations of binary inputs.
- 30. A parallel counter according to claim 21 wherein said logic circuit is arranged to generate a first binary output as a symmetric function of the binary inputs using exclusive OR logic for combining a plurality of sets of one or more binary inputs, and to generate an N<sup>th</sup> binary output as a symmetric function of the binary inputs using OR logic for combining a plurality of sets of one or more binary inputs.
- 31. A parallel counter according to any one of claims 21 to 30 wherein said logic circuit is arranged to generate two possible binary outputs for a binary output less significant than the N<sup>th</sup> binary output, as symmetric functions of the binary inputs using OR logic for combining a plurality of sets of one or more binary inputs where N is the number of binary outputs, the sets used for each possible binary output being of two different sizes which are a function of the binary output being generated; and said logic circuit including selector logic to select one of the possible binary outputs based on a more significant binary output value.
- 32. A parallel counter according to claim 31 wherein said logic circuit is arranged to generate the two possible binary outputs for the (N-1)<sup>th</sup> binary output less significant

than the N<sup>th</sup> binary output, as symmetric functions of the binary inputs using OR logic for combining a plurality of sets of one or more binary inputs, the sets used for each possible binary output being of size  $2^{N-1} + 2^{N-2}$  and  $2^{N-2}$  respectively and said selector logic being arranged to select one of the possible binary outputs based on the N<sup>th</sup> binary output value.

- 33. A parallel counter according to any one of claims 21 to 32 wherein said logic circuit includes a plurality of subcircuit logic modules each generating intermediate binary outputs as a symmetric function of some of the binary inputs, and logic for logically combining the intermediate binary outputs to generate said binary outputs.
- 34. A parallel counter according to claim 33 wherein said subcircuit logic modules are arranged to use OR logic for combining sets of said some of said binary inputs.
- 35. A parallel counter according to claim 34 wherein said logic circuit includes one or more logic modules each for generating a binary output as a symmetric function of the binary inputs using executive OR logic for combining a plurality of sets of one or more binary inputs.
- 36 A logic circuit for multiplying two N bit binary numbers, the logic circuit comprising:

array generation logic for performing the logical AND operation between each bit in one binary number and each bit in the other binary number to generate an array of logical AND combination comprising an array of binary values, and for further logically combining values to generate the array in which the maximal depth of the array is below N bits;

array reduction logic for reducing the depth of the array to two binary numbers; and

addition logic for adding the binary values of the two binary numbers.

- 37. A logic circuit according to claim 36 wherein said array generation logic is arranged to perform the further logical combination of values for values formed by the logical AND combination of each bit  $A_i$  of one binary number and each bit  $B_j$  of the other binary number, where i-j-k  $\leq$ 1, k is a chosen integer, and i and j are integers from 1 to N.
- 38. A logic circuit according to claim 36 or claim 37 wherein said array generation logic is arranged to logically AND combine each bit A<sub>i</sub> of the first binary number with each bit B<sub>j</sub> of a second binary number to generate said array comprising a sequence of binary numbers represented by said logical AND combinations, A<sub>i</sub> AND B<sub>j</sub> and to carry out further logical combination by logically combining the combination A<sub>1</sub> AND B<sub>N-2</sub>, A<sub>1</sub> AND B<sub>N-1</sub> where N is the number of bits in the binary numbers.
- 39. A logic circuit according to claim 38 wherein said array generation logic is arranged to combine the combinations A<sub>1</sub> AND B<sub>N-2</sub> and A<sub>0</sub> AND B<sub>N-1</sub>, using exclusive OR logic to replace these combinations, and to combine A<sub>1</sub> AND B<sub>N-1</sub> and A<sub>0</sub> AND B<sub>N-2</sub> to replace the A<sub>1</sub> AND B<sub>N-1</sub> combination.
- 40. A logic circuit according to any one of claims 36 to 39 wherein said array reduction logic includes at least one of: at least one full adder, at least one half adder, and at least one parallel counter.
- 41. A logic circuit according to claim 40 wherein said array reduction logic includes at least one parallel counter according to any one of claims 1 to 16 or 21 to 35.







Application No: Claims searched: GB 0101961.1 1-16, 18-35, 41 Examiner:

Matthew J. Tosh

Date of search:

9 April 2001

### Patents Act 1977 Search Report under Section 17

#### Databases searched:

UK Patent Office collections, including GB, EP, WO & US patent specifications, in:

UK Cl (Ed.S): G4A (AGB, AMD), H3P (PHX)

Int Cl (Ed.7): G06F 7/52, 7/60, H03K 19/20, 19/23

ONLINE: ELSEVIER, EPODOC, IEL, INSPEC, JAPIO, TDB, WPI Other:

### Documents considered to be relevant:

| Category | Identity of document and relevant passage                                                                                                                                      |                                                    | Relevant<br>to claims |
|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------|-----------------------|
| A        | EP 0741354 A2                                                                                                                                                                  | (CANON). See lines 25-27, page 5 and Figures 1-3B. |                       |
| A        | US 5995029                                                                                                                                                                     | (HYUNDAI). See whole document.                     |                       |
| A        | Parallel Counter Implementation, Jones R. F., Swartzlander E. E. Jr., Conf. Rec. 26th Asilomar Conf. Signals, Systems & Computers 1992, Vol. 1, pp381-385, ISBN 0-8186-3160-0. |                                                    |                       |

- Member of the same patent family
- Document indicating technological background and/or state of the art. Document published on or after the declared priority date but before the filing date of this invention.
- Patent document published on or after, but with priority date earlier than, the filing date of this application.

Document indicating lack of novelty or inventive step

Document indicating lack of inventive step if combined with one or more other documents of same category.







Application No:

GB 0101961.1

Claims searched:

17

Examiner:

Matthew J. Tosh

Date of search:

15 May 2001

## Patents Act 1977 Further Search Report under Section 17

#### Databases searched:

UK Patent Office collections, including GB, EP, WO & US patent specifications, in:

UK Cl (Ed.S): G4A (AMD, AGB); H3P (PHX)

Int Cl (Ed.7): G06F 7/50, 7/52, 7/60; H03K 19/20, 19/23

Other: ONLINE: ELSEVIER, EPODOC, IEL, INSPEC, JAPIO, TDB, WPI

#### Documents considered to be relevant:

| Category | Identity of document and relevant passage |   |  |
|----------|-------------------------------------------|---|--|
|          | NONE.                                     |   |  |
|          |                                           | l |  |

- X Document indicating lack of novelty or inventive step
   Y Document indicating lack of inventive step if combined with one or more other documents of same category.
- & Member of the same patent family
- A Document indicating technological background and/or state of the art.

  P Document published on or after the declared priority date but before the filing date of this invention.
- E Patent document published on or after, but with priority date earlier than, the filing date of this application.







Application No:

GB 0101961.1

Claims searched: 36-40

Examiner:

Matthew J. Tosh

Date of search: 15 M

15 May 2001

# Patents Act 1977 Further Search Report under Section 17

#### Databases searched:

UK Patent Office collections, including GB, EP, WO & US patent specifications, in:

UK Cl (Ed.S): G4A (AMD); H3P (PHX)

Int Cl (Ed.7): G06F 7/52, 7/60; H03K 19/20

Other: ONLINE: ELSEVIER, EPODOC, IEL, INSPEC, JAPIO, TDB, WPI

#### Documents considered to be relevant:

| Category | Identity of document and relevant passage |                                                                                    | Relevant<br>to claims |
|----------|-------------------------------------------|------------------------------------------------------------------------------------|-----------------------|
| A<br>A   | US 5095457<br>FR 2475250 A1               | (SAMSUNG). See whole document.  (MATÉRIAL TÉLÉPHONIQUE THOMSON). See WPI abstract. |                       |

- & Member of the same patent family
- A Document indicating technological background and/or state of the art.

  Document published on or after the declared priority date but before the filing date of this invention.
- E Patent document published on or after, but with priority date earlier than, the filing date of this application.

X Document indicating lack of novelty or inventive step
 Y Document indicating lack of inventive step if combined with one or more other documents of same category.